Boosting search by rare events 
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Randomized search algorithms for hard combinatorial problems exhibit a large variability of per- 
formances. We study the different types of rare events which occur in such out-of-equilibrium 
stochastic processes and we show how they cooperate in determining the final distribution of run- 
ning times. As a byproduct of our analysis we show how search algorithms are optimized by random 
restarts. 
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Recent years have witnessed an increasing convergence 
of research themes coming from out-of-equilibrium statis- 
tical physics and computer science or discrete mathemat- 
ics [[l, ||, ||. For instance, giving a 'static' characteriza- 
tion of systems displaying an extremely slow dynamics 
is a central problem both in computer science [0 and 
in spin glass theory The results in these fields are 
strongly focused on the typical properties of large ran- 
dom systems. This approach is justified as long as the 
quantities of interest concentrate in probability around 
some typical value when the size diverges (the so called 
self- averaging property). 

In this letter we provide an analytical and numerical 
study of different types of rare events which occur in the 
time evolution of randomized search algorithms for hard 
optimization problems. As a byproduct of our analy- 
sis, we find a general picture for understanding and opti- 
mizing the introduction of restarts in randomized search 
algorithms. This has recently proven to be an highly 
effective technique for improving such algorithms . 

It is a well known fact (and a basic problem for both 
theoretical and applied computer science) that the so 
called NP-complete JtJ combinatorial decision problems 
might require computational resources that grow expo- 
nentially with the number of variables N needed for their 
encoding. However combinatorial search methods often 
exhibit remarkable variability in performance: it is not 
uncommon to observe a combinatorial method "hang" on 
a given instance of a problem, whereas a different heuris- 
tic algorithm, or even just another stochastic run, solves 
the instance quickly. 

With the aim of clarifying such behavior, in the re- 
cent years there has been an intense research activity on 
randomly generated hard combinatorial problems which 
has lead to the identification of non-trivial problem en- 
sembles Q . Particularly representative and widely stud- 
ied examples are satisfiability of random Boolean expres- 
sions, vertex coloring and covering of random graphs and 
number partitioning [jl[. 

This type of setting gives us much more freedom than 
in a standard physical experiment. Indeed an algorithm 
can be run an exponential number of times with each 
run in turn possibly taking exponential time. In such a 



situation rare events may have dramatic effects and com- 
pletely determine the total computational time and the 
outcome of such random-restart experiments. We will 
show that there exist distinct sources of hardness fluc- 
tuations, static (i.e. intrinsic) and dynamic (algorithm- 
dependent), which account for the variability of resolu- 
tions times. 

While our approach is general and applies to a wide 
class of problems, in what follows we focus on the (NP- 
complete) vertex cover (VC) problem restricted over ran- 
dom graphs. The choice of VC is dictated mainly by its 
relative simplicity. As expected, extensive numerical ex- 
periments match the analytical predictions. 

The action of a backtrack algorithm on combinatorial 
problems resembles decimation flows in statistical me- 
chanics Jl(| . The algorithm proceeds by choosing at ran- 
dom one or more variables at a time and assigning their 
values according to some heuristics. The problem is then 
turned into a sub-problem in which the assigned vari- 
ables act as correlated quenched randomness. This evo- 
lution can be described macroscopically by keeping track 
of a proper set of average quantities a (e.g. the ratio of 
the number of non-satisfied constraints to the number of 
non-assigned variables). The sub-instance generated by 
fixing a fraction t of the N variables may however not 
have any solution (this happens because the algorithm 
made a wrong assignment at an earlier stage). Sooner 
or later the algorithm detects the inexistence of solutions 
compatible with the variables assigned so far and begins 
a backtrack correction process which may take an expo- 
nential number of steps to correct early mistakes. 

The fraction t of assigned variables acts therefore 
as a control parameter and the system undergoes a 
SAT/UNSAT phase transition (i.e. a transition from a 
satisfiable to an un-satisfiable instance of the problem) 
when t crosses some critical value t c ll(J. This corre- 
sponds to the trajectory a(t) crossing a critical surface 
in the static phase diagram at d? c = a(t c ). For simple 
enough randomized algorithms it is possible to compute 
the size e^ *-"- 1 |l^, |ll| of the backtracking tree for an 
UNSAT instance characterized by the parameters a. The 
computational complexity of the algorithm is therefore 
given by exp[(l - t c )tt(a c )N}. 
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How do rare events enter this scenario and how to take 
advantage of their presence? There are at least two com- 
peting phenomena involving large deviations which affect 
the resolution time. 

(I) Let us assume that the trajectory a(t) follows 
the most probable line. Once the SAT/UNSAT criti- 
cal line is crossed there still exists a small probability 
cxp[— N(l — t)ip(a(t))] of having generated a subproblem 
which is solvable. The deeper one goes into the UNSAT 
phase the smaller will be such probability. On the other 
hand such a rare event corresponds to a reduction of or- 
der N of the size of the problem and, therefore, to an 
exponential reduction of the size exp[V(l — t)il(d(t))} of 
the backtracking tree. 

This trade-off can be exploited in a random restart 
algorithm: we interrupt the search after e R backtrack- 
ing steps and re- run it (with different random numbers). 
The probability of finding a solution in one of such runs 
is given by Ps ~ exp[— Vmin t (I — t)ip(a(t))), where 
t is constrained by the fact that the size of the back- 
tracking tree must be smaller than e NTR : this implies 
(1 — t)£l(a(t)) < tr. Assuming that different stochastic 
runs quickly lead to uncorrelated sub-problems, a solu- 
tion is found after Nr 1/Pr restarts. The complexity 
of the algorithm is therefore exp[A r r(r/j)], where 

t(t r ) = t r + min (1 - t)i/j(a(t)) , (f) 
and where the minimizing value of t must satisfy 

(i - t)n(3(t)) < T R . (2) 

(II) The above scenario is however largely incomplete. 
Indeed there exist another, dynamical, source of fluctua- 
tions: O(l) fluctuations with respect to the typical tra- 
jectory (this effect has been studied and pointed out to 
us by S. Cocco and R. Monasson Jl^]). At time t the 
macroscopic parameters take the value a with probabil- 
ity exp[— NI t (d)] (with It{a) = along the typical tra- 
jectory a — d(t)). Again, such a rare event implies an 
exponential change in the computational complexity, and 
the possible gain can be exploited by the random restart 
algorithm. Equation (|l|) must be properly generalized. 
We get 

t(tr) = tr + min {I t (a) + (1 - i)V(S)} , (3) 

t ,a 

always with the constraint Eq.(||). These are not the 
only sources of fluctuations but they give a quite accurate 
picture of the phenomenon. 

Let's apply the above scheme to the case of VC, which 
is at the same time NP-complete and very easy to 
define: Consider an undirected graph G = (V, E) with N 
vertices i 6 V = {1,2, N} and L edges e E C 

V x V. The problem consists in distributing X covering 
marks over the vertices in such a way that every edge 
of the graph is covered, that is it has at least one of its 



ending vertices which is marked. If such covering can be 
found the graph is said to be cover able (COV), otherwise 
it is uncoverable (UNCOV). 

A non trivial ensemble of graphs which captures some 
relevant computational features of VC at the level of typ- 
ical or average cases, is the set of random graphs Gn,l 
with N vertices and L edges (and flat probability distri- 
bution). Similarly to other random NP-complete prob- 
lems jj] , a threshold phenomena occurs as the control pa- 
rameter X is changed |l5|. For a given average connectiv- 
ity c = 2L/(N— 1 ) , when the number X = xN of covering 
marks is lowered the model undergoes a COV/UNCOV 
transition at some critical density of covers x c (c) for 
N — ► oo. Statistical mechanics methods allow for a pre- 
cise estimate of x c (c) 0| and probabilistic tools provide 
rigorous lower and upper bounds for such a threshold 
fl3[ For x > x c (c), vertex covers of size Nx exist 
with probability one, for x < x c (c) the available cov- 
ering marks are not sufficient. The statistical mechan- 
ics analysis is performed by mapping the optimization 
problem onto a zero temperature disordered system with 
Hamiltonian 

ff(M) = £<W> (4) 

i 

where m G {0, 1} (n, = if a mark is put on vertex i) and 
satisfy an excluded volume constraint: if (ij) G G then 
either i%i = or rij = 0. The ground state energy Eqs 
of the model is the minimum number of marks needed 
for covering the graph: for X > Eqs the graph is COV, 
while for X < E GS it is UNCOV. 

It is known experimentally and analytically for some 
algorithms |Tl| that the typical computational cost, given 
e.g. by the number of visited decision nodes in the back- 
tracking tree, becomes exponential for initial conditions 
in a region close or below x c (c), while it remains linear 
well inside the coverable phase, x > x c (c). This easy- 
hard scenario characterizes the typical-case complexity 
pattern found in other NP-complete random ensembles 

We consider the following backtrack algorithm [|9|. 
During the computation, a vertices can be covered, un- 
covered or just free, meaning that the algorithm has not 
yet assigned any value to that vertex. Here is the re- 
cursive procedure: The algorithm chooses a vertex i at 
random among those which are free (at t = all vertices 
are free). If i has neighboring vertices which are either 
free or uncovered, then the vertex i is declared covered 
first. In case i has only covered neighbors, the vertex 
is declared uncovered. The process continues unless the 
number of covered vertices exceeds Nx. If the algorithm 
backtracks, then the opposite choice is taken for the ver- 
tex i unless this corresponds to declaring uncovered a 
vertex whose neighbors are all uncovered. The algorithm 
halts if it finds a solution (and declares the graph to be 
COV) or after exploring all the search tree (in this case 
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it declares the graph to be UNCOV). 

In order to study the algorithm we need to follow the 
variables X, N, L which become time dependent. In one 
time step (T — » T + 1), the probability for a change 
L — > L + AL in the number of links and X — > X + AX 
in the number of available covering marks reads 



Pal,ax 



<5AX,0<5AL,0 + ^ Tj-^AX-l^AL 



fc=l 



(5) 



This defines a Markov process in the space (X,Gm,l) 
which mimics the effects of the algorithm. We want to 
iterate the above step AT times and compute the cor- 
responding transition probability. Let us introduce the 
rescaled time t = T/N (i.e. the fraction of assigned vari- 
ables) and the macroscopic time dependent parameters 
c(t) = 2Lt/Nt and x(t) — Xt/Nt (which we denoted 
collectively by a in the general description of our ap- 
proach). Due to the Markovian structure of this process, 
the probability for a trajectory a(t) = (c(t),x(t)) can be 
written in a path integral form. To the leading order we 
get P[c,a;] = jT>s exp{— N fdtCt(c,x,s)}, where 

C t (c,x,s) = -isd t [(l - t)c] + (-x)log(-x) 
+(l + x) log(l + x) + x log (exp(ce ls ) - l) + c , (6) 

where we used the shorthand: x(t) = (1 — t)x(t). The 
transition probability Pt 1 (co, xq — » C\,x\) is given by the 
corresponding constrained path-integral. Such an inte- 
gral can be computed by saddle-point, leading to an ex- 
plicit formula for the trajectories: 



c(t) - 
x(t) = 



CO 



1 - 

Xq 



t B(l-t) 
-t A~l 



i(i-t) c, 



logC 



1 -t 



AB{1 -t) 



log 



1 



A 1 
-Ae 



l-Ae-Bi 1 -^ 



(7) 
(8) 



where the two integration constants (A and B) must be 
computed from the conditions c{t\) = ci, x(t%) — x\. 
The large deviation functional is It 1 (ci, x%) = j^dtCt{-), 
where the integral is computed along the trajectory 
(|),(|). For A — and B = Co we recover the typical 
trajectory jll| and we get Jt(c, x) = 0. As shown in Fig. 
[j], numerical simulation are in remarkable agreement with 
the analytical predictions. 

A subgraph generated according to the process de- 
scribed above can be still COV (with an exponentially 
small probability) after the trajectory d(t) = (c(t),x(t)) 
has entered the UNSAT phase (i.e. after x(t) < x c (c(t))). 
Repeated restarts can exploit this rare event. The size 
exp[A r (l — t)fl(a(t))] of the backtrack tree at any point 
in the UNCOV region can be computed analytically |0] 
and used in Eq.(||). Hence, in order to evaluate Eq.(^, 
we just need to compute the probability of being COV in 
the UNCOV region, that is we need to know the proba- 
bility distribution of the ground state energy of the model 
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FIG. 1: Dynamical rare events. We consider the probability 
Pt(x) that, after Nt steps we are left with Nx marks, and 
we plot It(x) — — log P t (x)/N. The continuous line is the 
theoretical prediction It(x) = min c It(c, x), while the sym- 
bols are numerical results for N — 100 (empty circles), 200 
(squares), 300 (stars) and 400 (full circles). We used the the 
initial condition Co = 2, xq = 0.5, and t = 0.5. 



([|) . This computation can be carried over by the replica 
method. We notice that the replicated partition function 
averaged over the disorder reads 



dP{E GS )e 



-uiEgs 



(9) 



where one takes the zero temperature limit [3 — > oo keep- 
ing u> = n(3 fixed. As N — > oo, (Z n ) ~ exp[— N(f>(w)] and 
P{E GS ) ~ exp[-Ni/)(xj], where x = E GS /N. We get 
therefore <p(iv) — ijj{x) + ux\ u]= _ d ^. 

The small ui behavior of 4>(ui) yields the typical ground 
state energy and its small fluctuations. The knowledge of 
the whole function 4>(uj) gives the large deviation proper- 
ties of the ground state energy. In general ip(x) is convex 
and has its unique zero at the typical ground state energy 
x = x c (c). The probability that a graph in the ensemble 
is coverable with X — Nx < Nx c (c) marks is given by 
exp[— Nif}(x)]. A replica symmetric calculation yields 



0(w) = c(l-F Q ) 
-log[e- w 



^lOg^Q 

(i-O 



(10) 



where we used the short-hand Fq = [1 + (e u — l)Q 2 } 1 
and Q satisfies the self-consistency equation: Q^ 1 = 
1 - [1 + exp(cF Q Q)] . Figure | gives the geometrical 
picture of a random restart experiment. Quite remark- 
able is the prediction on the (c, x)-values up to which the 
algorithm has to backtrack before finding the solution. 
Such a curve lies well inside the UNCOV region indicat- 
ing that the two types of rare events are both relevant 
for Tfl > 0. In Fig. [| we consider the computational 
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FIG. 2: Random restart experiments with initial condition 
at c = 3.2, x = 0.6 (empty circle). The long dashed line is 
the replica symmetric critical line x = x c (c). The rightmost 
dotted line is the typical trajectory. The leftmost one is the 
rare trajectory followed by the last (successful) restart of the 
algorithm when tr = 0.1. The symbols are numerical re- 
sults for the (c, a;)-values corresponding to the backtrack tree 
generated by the algorithm since the last restart. Triangles, 
squares and stars correspond, respectively, to iV = 30, 60, 
120. The continuous line is the theoretical prediction for the 
same quantity (i.e. the minimizing a = (c, x) in Eq. (g)). 

complexity e Nr ^ TR ^ of the random restart algorithm for 
the initial condition c = 3.2, x = 0.6. Finite size effects 
are important for the achievable sizes of the problem. An 
extrapolation can be done for the smaller values of tr, 
where we were able to run the algorithm on much larger 
systems. 

Building on large deviations results we have shown that 
running times of randomized complete search algorithms 
can be greatly reduced by a restart strategy. The op- 
timal restart rate r^ pt can be computed within our ap- 
proach: for VC we find r^ pt = 0. This result highlights 
the relevance of the sub-exponential regime which has 
been investigated thoroughly in Ref. jl?]]. In more gen- 
eral cases we expect r^ pt > || . In would be interesting 
to explore whether this rare events scenario also applies 
to other classes of stochastic processes, both algorithmic 
and physical. 

Credit for this paper should be shared with S. Cocco 
and R. Monasson for very fruitful conversations concern- 
ing dynamical fluctuations in random 3-SATp7j. We 
also greatly thank A. Braunstein, S. Franz, M. Mezard, 
G. Parisi, N. Sourlas and M. Weigt. 



I — | montanar@lpt.ens.fr 

' zecchina@ictp.trieste.it; Permanent address: ICTP, 
Strada Costiera 11, 1-34100 Trieste, Italy. 



0.25 




0.05 -'' 



0.05 0.1 0.15 0.2 0.25 

\ 

FIG. 3: Typical computational complexity of the random 
restart algorithm. Here we plot the logarithm of number of 
nodes visited by the algorithm divided by the size N, for dif- 
ferent values of the restart rate tr. Symbols refer to N = 30 
(circles), 60 (triangles), and 120 (diamonds). The stars are 
the result of an N — > oo extrapolation. The continuous and 
dashed lines reproduce the theoretical prediction with, cf. Eq. 
(Ej), and without, cf. Eq. (|l|), dynamical rare events. 
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